Automated content scoring of spoken responses containing multiple parts with factual information
نویسندگان
چکیده
This paper presents approaches to automated content scoring of spoken language test responses from non-native speakers of English which contain multiple parts addressing factual information that the test taker has previously heard via auditory stimulus materials. While previous work relating to content scoring of spontaneous, unpredictable speech has focused only on entire responses and on general topic matching approaches, such as content vector analysis, the specific nature of spoken responses in our data requires response segmentation and extraction of features that indicate the relevance and correctness of the facts contained in the different parts of the response. Our best content features, based on similarity with key facts and concepts, achieve correlations of r = 0.615 (for speech recognition output) and r = 0.637 (using human transcriptions) with expert human rater scores. Furthermore, we show that these content features outperform traditional vector space based features. Finally, we demonstrate that the performance of a scoring model based on a combination of features developed previously and some of the newly designed content features improves significantly from r = 0.624 to r = 0.664 on an unseen evaluation set when using speech recognition output.
منابع مشابه
Prompt-based Content Scoring for Automated Spoken Language Assessment
This paper investigates the use of promptbased content features for the automated assessment of spontaneous speech in a spoken language proficiency assessment. The results show that single highest performing promptbased content feature measures the number of unique lexical types that overlap with the listening materials and are not contained in either the reading materials or a sample response,...
متن کاملScoring Spoken Responses Based on Content Accuracy
Accuracy of content have not been fully utilized in the previous studies on automated speaking assessment. Compared to writing tests, responses in speaking tests are noisy (due to recognition errors), full of incomplete sentences, and short. To handle these challenges for doing content-scoring in speaking tests, we propose two new methods based on information extraction (IE) and machine learnin...
متن کاملModeling Discourse Coherence for the Automated Scoring of Spontaneous Spoken Responses
This study describes an approach for modeling the discourse coherence of spontaneous spoken responses in the context of automated assessment of non-native speech. Although the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spontaneous spoken language, little prior research has been done to assess a speaker’s coherence in the context of a...
متن کاملUsing an Ontology for Improved Automated Content Scoring of Spontaneous Non-Native Speech
This paper presents an exploration into automated content scoring of non-native spontaneous speech using ontology-based information to enhance a vector space approach. We use content vector analysis as a baseline and evaluate the correlations between human rater proficiency scores and two cosine-similarity-based features, previously used in the context of automated essay scoring. We use two ont...
متن کاملAutomated Content Scoring of Spoken Responses in an Assessment for Teachers of English
This paper presents and evaluates approaches to automatically score the content correctness of spoken responses in a new language test for teachers of English as a foreign language who are non-native speakers of English. Most existing tests of English spoken proficiency elicit responses that are either very constrained (e.g., reading a passage aloud) or are of a predominantly spontaneous nature...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013